Code Snippets#

Annotate your variants#

This example shows how you can annotate your variants. Running the command below will output a pandas dataframe.

import missionbio.mosaic as ms
sample = ms.load_example_dataset("3 cell mix")
filtered_variants = list(sample.dna.filter_variants())

sample.dna.get_annotations(filtered_variants[0:5])
Position Ref allele Alt allele Varsome url Variant type RefSeq transcript id Gene Protein cDNA Coding impact Function Allele Freq (gnomAD) DANN dbSNP rsids ClinVar COSMIC ids
Variant ID
chr2:25458546:C/T 25458546 C T https://varsome.com/variant/hg19/chr2:25458546... SNV NM_022552.5 DNMT3A c.2597+30G>A intronic 0.513776 0.551042 rs2304429 Benign
chr2:25469502:C/T 25469502 C T https://varsome.com/variant/hg19/chr2:25469502... SNV NM_022552.5 DNMT3A DNMT3A:p.L422= c.1266G>A synonymous coding 0.190004 0.802194 rs2276598 Benign
chr2:25470426:C/T 25470426 C T https://varsome.com/variant/hg19/chr2:25470426... SNV NM_022552.5 DNMT3A c.1014+34G>A intronic 0.000702 0.635212 rs142243425
chr2:25470573:G/A 25470573 G A https://varsome.com/variant/hg19/chr2:25470573... SNV NM_022552.5 DNMT3A DNMT3A:p.R301W c.901C>T missense coding 0.999237 rs1553414070 Conflicting Interpretations Of Pathogenicity
chr2:209113192:G/A 209113192 G A https://varsome.com/variant/hg19/chr2:20911319... SNV NM_005896.4 IDH1 IDH1:p.G105= c.315C>T synonymous coding 0.05057 0.680929 rs11554137 Benign

Currently we provide the following annotations:

  • Variant type

  • Gene

  • Gene function

  • RefSeq transcript ID

  • cDNA change

  • Protein change

  • Protein coding impact

  • COSMIC

  • DANN SNVs

  • ClinVar

  • gnomAD

  • dbSNP

If you need additional annotation sources, please contact us.

Multi-assay heatmap#

The following examples demonstrate how to produce a heatmap showing data from multiple assays. The first example shows how to cluster cells by protein expression, then produce a heatmap that shows the per-cluster protein expression and DNA mutation status for some select variants.

import missionbio.mosaic as ms

sample = ms.load_example_dataset("3 cell mix")

# first, cluster by protein expression
sample.protein.normalize_reads()
sample.protein.run_pca(attribute='normalized_counts', components=5)
sample.protein.run_umap(attribute='pca')
sample.protein.cluster(attribute='pca', method='graph-community', k=100)

# show only two DNA variants, and chromosomes 1 and 2 for CNV
sample.heatmap(("protein", "dna", "cnv"), features=(None, sample.dna.ids()[:2], ["1", "2"]))

If you cluster by protein, you can also quantify the percentage and mutated cells in each cluster for your mutation(s) of choice.

import missionbio.mosaic as ms

sample = ms.load_example_dataset("3 cell mix")

# first, cluster by protein expression
sample.protein.normalize_reads()
sample.protein.run_pca(attribute='normalized_counts', components=5)
sample.protein.run_umap(attribute='pca')
sample.protein.cluster(attribute='pca', method='graph-community', k=100)

# show only two DNA variants

sample.signaturemap(("protein", "dna", "cnv"), features=(None, sample.dna.ids()[:2], ["1", "2"]))

Fishplot to visualize clonal evolution#

Draws a fish plot and associated graphical representation of clonal phylogeny. Currently, you must provide the phylogenetic relationships between clones.

import missionbio.mosaic as ms

group = ms.load_example_dataset('Multisample PBMC')
group.fishplot(labels=["Clone 1", "Clone 2"], parents=[None, None])

Group cells by a select number of DNA variants#

Clusters cells into clones based on the provided variants and returns a dataframe of per-clone and per-variant statistics. This algorithm also takes into consideration allele dropout out (ADO) to identify potential false positive clones.

import missionbio.mosaic as ms

sample = ms.load_example_dataset('3 cell mix')
filtered_variants = sample.dna.filter_variants()
sample.dna.group_by_genotype(features=filtered_variants[0:5])
clone 1 2 3 4 5 Missing GT clones (123) Small subclones (14) ADO clones (0)
chr2:25458546:C/T Het (51.44%) WT (0.67%) WT (0.6%) WT (0.56%) WT (0.57%) Missing in 14.54% of cells WT (16.94%) NaN
chr2:25469502:C/T WT (0.1%) Het (52.98%) WT (0.19%) WT (0.0%) WT (0.16%) Missing in 43.17% of cells WT (26.55%) NaN
chr2:25470426:C/T WT (0.69%) WT (0.44%) Het (64.08%) Hom (99.61%) WT (0.51%) Missing in 26.90% of cells WT (6.99%) NaN
chr2:25470573:G/A Het (50.85%) WT (0.72%) WT (0.9%) WT (0.45%) WT (0.65%) Missing in 22.13% of cells WT (15.26%) NaN
chr2:209113192:G/A WT (0.63%) WT (0.63%) Het (50.88%) Het (52.1%) WT (1.09%) Missing in 6.70% of cells WT (20.56%) NaN
Total Cell Number 290 (11.71%) 220 (8.89%) 193 (7.79%) 30 (1.21%) 25 (1.01%) 1653 (66.76%) 65 (2.63%) NaN
3 cell mix Cell Number 290 (11.71%) 220 (8.89%) 193 (7.79%) 30 (1.21%) 25 (1.01%) 1653 (66.76%) 65 (2.63%) NaN
Parents NaN NaN NaN [3] [small, 2] NaN NaN NaN
Sisters NaN NaN NaN [small] [small, small] NaN NaN NaN
ADO score 0 0 0 0.912037 0.953065 NaN NaN NaN